Skip to content

fix(linkedin): home_feed author body-fallback + drop promoted/suggested/noise rows#1156

Merged
buremba merged 2 commits into
mainfrom
feat/home-feed-cleanup
May 29, 2026
Merged

fix(linkedin): home_feed author body-fallback + drop promoted/suggested/noise rows#1156
buremba merged 2 commits into
mainfrom
feat/home-feed-cleanup

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 29, 2026

What

Two quality fixes for the LinkedIn home_feed connector path (packages/connectors/src/linkedin.ts). The home feed is the one LinkedIn feed that can't use network capture — attaching the CDP debugger stops it rendering — so it relies on the extension's content-script genericScrape against a declarative selector config. That makes everything here inherently heuristic: there's no structured Voyager response to read author/ad fields from, only the scraped row { id, body, author }.

Bug 1 — author came back empty

The old author selector .update-components-actor__title, .update-components-actor__name no longer matches: LinkedIn obfuscates the actor class names in the live feed DOM. I probed the live feed — the actor classes don't match, but the author name is reliably present in the row's body text.

  • Changed the author field selector in HOME_FEED_SCRAPE_CONFIG to a best-effort that catches the actor link's visible name span when present (a[href*="/in/"] span[aria-hidden], a[href*="/company/"] span[aria-hidden]).
  • Added a pure parseHomeFeedAuthor(body) helper that recovers the author from body text when the selector misses: strips a leading "Feed post ", follows "reposted this" to the original poster, then takes the name before the " • " connection-degree marker (capped to 60 chars; also strips a trailing relative-time token like "17h" that appears in repost segments).
  • buildHomeFeedEvents now uses row.author when the DOM selector won, else falls back to the body parse.

Bug 2 — promoted/suggested/noise rows became events

The feed mixes in ads, suggestions, and non-post noise (e.g. "Load more comments"). Added a pure isHomeFeedNoise(body) helper and skip those rows before emitting:

  • empty or < 30 chars (drops "Load more comments" etc.)
  • Promoted in the first 130 chars (drops ads like "Feed post Attio 52,728 followers Promoted …")
  • Suggested in the first 30 chars (drops "Feed post Suggested …")

Existing id/body dedupe is unchanged.

Why heuristic

This path can't use the network-capture primitive the other LinkedIn feeds use, so there's no structured author field or ad flag — body-text parsing and substring filtering are the only signals available over a content-script scrape.

Tests

Extended packages/connectors/src/__tests__/linkedin.test.ts with unit coverage for parseHomeFeedAuthor (incl. the repost case), isHomeFeedNoise (the 3 drop cases + a normal keep), and buildHomeFeedEvents end-to-end (keep+drop mix, authors correct, row.author preferred over body parse).

bun test packages/connectors/src/__tests__/   →  41 pass / 0 fail

bunx tsc --noEmit -p packages/connectors/tsconfig.json reports only pre-existing errors (connector-sdk stale-dist "no exported member" + implicit-any in the untouched company-updates/jobs sync code) — no new errors in linkedin.ts. Root tsc --noEmit (pre-commit) passes clean.


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag @codesmith with what you need. Autofix is disabled.

Summary by CodeRabbit

Release Notes

  • New Features

    • Improved LinkedIn home feed by filtering out promoted ads and suggestions for cleaner content
    • Enhanced author extraction accuracy for home feed posts
  • Bug Fixes

    • Strengthened error handling for missing Chrome extension dependencies
    • More robust parsing of company updates and job listings

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Warning

Review limit reached

@buremba, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 6 minutes and 1 second. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: fa7bac93-5191-421e-aa40-8acb16963848

📥 Commits

Reviewing files that changed from the base of the PR and between ee5ae77 and f3d89bc.

📒 Files selected for processing (2)
  • packages/connectors/src/__tests__/linkedin.test.ts
  • packages/connectors/src/linkedin.ts
📝 Walkthrough

Walkthrough

The PR extends the LinkedIn connector with home feed author extraction and noise filtering heuristics. It adds two new exported utilities, integrates them into event construction, hardens extension dispatcher validation and Voyager API parsing, and updates test coverage accordingly.

Changes

LinkedIn Connector Home Feed and Parsing

Layer / File(s) Summary
Home feed author parsing and noise filtering
packages/connectors/src/linkedin.ts, packages/connectors/src/__tests__/linkedin.test.ts
Added parseHomeFeedAuthor and isHomeFeedNoise heuristics; integrated into buildHomeFeedEvents to skip noise, deduplicate by component key, and resolve author via row data or body parsing. Updated test setup and expanded coverage for author fallback (row.author → body parse → empty), emoji handling, missing markers, and output capping.
Connector sync wiring and extension dispatcher
packages/connectors/src/linkedin.ts, packages/connectors/src/__tests__/linkedin.test.ts
Enforced chrome_dispatcher presence in sessionState with clear error; updated filter helper for deduplication; refreshed syncHomeFeed, syncUpdates, and syncJobs to report scrape metadata and sort events chronologically. Updated connector tests to verify dispatch payloads, event IDs, metadata, and error messages.
Voyager API defensive parsing
packages/connectors/src/linkedin.ts
Replaced company updates parser with multi-root GraphQL handling: builds URN lookup from included, searches multiple feed roots, defensively extracts metadata. Updated job listings to extract from multiple nesting shapes and derive IDs from URN segments with fallback generation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • lobu-ai/lobu#1151: Main PR that established the initial home_feed implementation extended by this PR with new author parsing and noise filtering utilities integrated into the same event construction pipeline.

Poem

🐇 A parser hops through LinkedIn feeds,
Extracting names and filtering weeds,
No ads, no noise, just genuine posts,
Authors resolved from what matters most! 🎉

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is detailed and well-structured, covering the 'What' and 'Why' sections comprehensively, but the 'Test plan' section is incomplete with all checkboxes left unchecked. Complete the Test plan section by checking which validations were actually run (at minimum 'bun run check:fix' and 'bun run typecheck' or 'make build-packages').
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main changes: fixing home_feed author parsing with body fallback and filtering noise rows (promoted/suggested).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/home-feed-cleanup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@buremba buremba force-pushed the feat/home-feed-cleanup branch from ee5ae77 to 03d38e5 Compare May 29, 2026 13:56
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/connectors/src/linkedin.ts`:
- Around line 153-158: The isHomeFeedNoise function is too permissive because it
uses case-insensitive regexes and thus filters posts that mention lowercase
"promoted" or "suggested"; update the two regexes in isHomeFeedNoise to match
the literal capitalized labels by removing the /i flag (change /\bPromoted\b/i
to /\bPromoted\b/ and /\bSuggested\b/i to /\bSuggested\b/) while keeping the
same slice ranges and early-return logic.
- Around line 260-272: The code uses Object.keys(feedRoot) without guarding
against feedRoot being null/undefined; update the logic around feedRoot and the
elements extraction (the feedRoot variable and the elements array assignment
loop) to first ensure feedRoot is a non-null object (e.g., coerce to {} or
return an empty elements array) before calling Object.keys, and skip the
for-loop when feedRoot is null/not an object so the rest of the function remains
defensive and won’t throw on malformed/empty json.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f27ff9d3-9b75-41a9-b44f-dca9a90c249d

📥 Commits

Reviewing files that changed from the base of the PR and between ef359c0 and ee5ae77.

📒 Files selected for processing (2)
  • packages/connectors/src/__tests__/linkedin.test.ts
  • packages/connectors/src/linkedin.ts

Comment on lines +153 to +158
export function isHomeFeedNoise(body: string): boolean {
if (!body || body.trim().length < 30) return true;
if (/\bPromoted\b/i.test(body.slice(0, 130))) return true;
if (/\bSuggested\b/i.test(body.slice(0, 30))) return true;
return false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use a case-sensitive match for the Promoted label to avoid dropping genuine posts.

The LinkedIn ad label is rendered exactly as Promoted (and Suggested). With /i over the first 130 chars, a legitimate post whose body mentions "promoted" early (e.g. a "just got promoted" update) is silently filtered out — and for a feed connector, dropping real posts is worse than letting an occasional ad through.

🛡️ Match the literal capitalized labels
-	if (/\bPromoted\b/i.test(body.slice(0, 130))) return true;
-	if (/\bSuggested\b/i.test(body.slice(0, 30))) return true;
+	if (/\bPromoted\b/.test(body.slice(0, 130))) return true;
+	if (/\bSuggested\b/.test(body.slice(0, 30))) return true;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function isHomeFeedNoise(body: string): boolean {
if (!body || body.trim().length < 30) return true;
if (/\bPromoted\b/i.test(body.slice(0, 130))) return true;
if (/\bSuggested\b/i.test(body.slice(0, 30))) return true;
return false;
}
export function isHomeFeedNoise(body: string): boolean {
if (!body || body.trim().length < 30) return true;
if (/\bPromoted\b/.test(body.slice(0, 130))) return true;
if (/\bSuggested\b/.test(body.slice(0, 30))) return true;
return false;
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/connectors/src/linkedin.ts` around lines 153 - 158, The
isHomeFeedNoise function is too permissive because it uses case-insensitive
regexes and thus filters posts that mention lowercase "promoted" or "suggested";
update the two regexes in isHomeFeedNoise to match the literal capitalized
labels by removing the /i flag (change /\bPromoted\b/i to /\bPromoted\b/ and
/\bSuggested\b/i to /\bSuggested\b/) while keeping the same slice ranges and
early-return logic.

Comment thread packages/connectors/src/linkedin.ts Outdated
Comment on lines +260 to +272
const feedRoot = data?.data?.data ?? data?.data ?? data;
let elements: any[] = [];
for (const key of Object.keys(feedRoot)) {
const val = feedRoot[key];
if (val?.["*elements"] && Array.isArray(val["*elements"])) {
elements = val["*elements"];
break;
}
if (val?.elements && Array.isArray(val.elements)) {
elements = val.elements;
break;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Object.keys(feedRoot) can throw on a null/empty response.

feedRoot falls back to data, so when json is null/undefined (empty or malformed intercepted body), feedRoot is null and Object.keys(null) throws — aborting this parse despite the rest of the function being carefully defensive (?? {}, ?? []).

🛡️ Guard the feed root before enumerating keys
 	const feedRoot = data?.data?.data ?? data?.data ?? data;
+	if (!feedRoot || typeof feedRoot !== "object") return posts;
 	let elements: any[] = [];
 	for (const key of Object.keys(feedRoot)) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/connectors/src/linkedin.ts` around lines 260 - 272, The code uses
Object.keys(feedRoot) without guarding against feedRoot being null/undefined;
update the logic around feedRoot and the elements extraction (the feedRoot
variable and the elements array assignment loop) to first ensure feedRoot is a
non-null object (e.g., coerce to {} or return an empty elements array) before
calling Object.keys, and skip the for-loop when feedRoot is null/not an object
so the rest of the function remains defensive and won’t throw on malformed/empty
json.

@buremba
Copy link
Copy Markdown
Member Author

buremba commented May 29, 2026

bug_free 88, simplicity 88, slop 0, bugs 0, 0 blockers

Script logs: typecheck/unit/integration all exit 0. Ran bun test packages/connectors/src/tests/linkedin.test.ts and git diff --check; both passed. Review focused on LinkedIn home_feed author fallback/noise filtering; no concrete defects found.

Full verdict JSON
{
  "bug_free_confidence": 88,
  "bugs": 0,
  "slop": 0,
  "simplicity": 88,
  "blockers": [],
  "change_type": "fix",
  "behavior_change_risk": "low",
  "tests_adequate": true,
  "suggested_fixes": [],
  "notes": "Script logs: typecheck/unit/integration all exit 0. Ran bun test packages/connectors/src/__tests__/linkedin.test.ts and git diff --check; both passed. Review focused on LinkedIn home_feed author fallback/noise filtering; no concrete defects found.",
  "categories": {
    "src": 61,
    "tests": 183,
    "docs": 0,
    "config": 0,
    "deps": 0,
    "migrations": 0,
    "ci": 0,
    "generated": 0
  }
}

Local review gate — branch protection can require the pi-review commit status. See docs/REVIEW_SCHEMA.md.

@buremba buremba merged commit 858e8e0 into main May 29, 2026
26 checks passed
@buremba buremba deleted the feat/home-feed-cleanup branch May 29, 2026 14:06
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants